NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Predicting antimicrobial resistance of bacterial pathogens using time series analysis

https://doi.org/10.3389/fmicb.2023.1160224

Kim, Jeonghoon; Rupasinghe, Ruwini; Halev, Avishai; Huang, Chao; Rezaei, Shahbaz; Clavijo, Maria J.; Robbins, Rebecca C.; Martínez-López, Beatriz; Liu, Xin (May 2023, Frontiers in Microbiology)

Antimicrobial resistance (AMR) is arguably one of the major health and economic challenges in our society. A key aspect of tackling AMR is rapid and accurate detection of the emergence and spread of AMR in food animal production, which requires routine AMR surveillance. However, AMR detection can be expensive and time-consuming considering the growth rate of the bacteria and the most commonly used analytical procedures, such as Minimum Inhibitory Concentration (MIC) testing. To mitigate this issue, we utilized machine learning to predict the future AMR burden of bacterial pathogens. We collected pathogen and antimicrobial data from >600 farms in the United States from 2010 to 2021 to generate AMR time series data. Our prediction focused on five bacterial pathogens (Escherichia coli, Streptococcus suis, Salmonella sp., Pasteurella multocida, andBordetella bronchiseptica). We found that Seasonal Auto-Regressive Integrated Moving Average (SARIMA) outperformed five baselines, including Auto-Regressive Moving Average (ARMA) and Auto-Regressive Integrated Moving Average (ARIMA). We hope this study provides valuable tools to predict the AMR burden not only of the pathogens assessed in this study but also of other bacterial pathogens.
more » « less
Full Text Available
On the Difficulty of Membership Inference Attacks

https://doi.org/10.1109/CVPR46437.2021.00780

Rezaei, Shahbaz; Liu, Xin (June 2021, International Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene

https://doi.org/10.3389/fvets.2021.683134

Kim, Jeonghoon; Lee, Kyuyoung; Rupasinghe, Ruwini; Rezaei, Shahbaz; Martínez-López, Beatriz; Liu, Xin (July 2021, Frontiers in Veterinary Science)

Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.
more » « less
Full Text Available
Multitask Learning for Network Traffic Classification

https://doi.org/10.1109/ICCCN49398.2020.9209652

Rezaei, Shahbaz; Liu, Xin (August 2020, 29th International Conference on Computer Communications and Networks (ICCCN))
null (Ed.)
Traffic classification has various applications in today's Internet, from resource allocation, billing and QoS purposes in ISPs to firewall and malware detection in clients. Classical machine learning algorithms and deep learning models have been widely used to solve the traffic classification task. However, training such models requires a large amount of labeled data. Labeling data is often the most difficult and time-consuming process in building a classifier. To solve this challenge, we reformulate the traffic classification into a multi-task learning framework where bandwidth requirement and duration of a flow are predicted along with the traffic class. The motivation of this approach is twofold: First, the bandwidth requirement and duration are useful in many applications, including routing, resource allocation, and QoS provisioning. Second, these two values can be obtained from each flow easily without the need for human labeling or capturing flows in a controlled and isolated environment. We show that with a large amount of easily obtainable data samples for bandwidth and duration prediction tasks, and only a few data samples for the traffic classification task, one can achieve high accuracy. Therefore, our proposed multi-task learning framework obviates the need for a large labeled traffic dataset. We conduct two experiments with ISCX and QUIC public datasets and show the efficacy of our approach
more » « less
Full Text Available
A Target-Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning

Rezaei, Shahbaz; Liu, Xin (January 2020, The International Conference on Learning Representations)

Due to insufficient training data and the high computational cost to train a deep neural network from scratch, transfer learning has been extensively used in many deep-neural-network-based applications. A commonly used transfer learning approach involves taking a part of a pre-trained model, adding a few layers at the end, and re-training the new layers with a small dataset. This approach, while efficient and widely used, imposes a security vulnerability because the pre-trained model used in transfer learning is usually publicly available, including to potential attackers. In this paper, we show that without any additional knowledge other than the pre-trained model, an attacker can launch an effective and efficient brute force attack that can craft instances of input to trigger each target class with high confidence. We assume that the attacker has no access to any target-specific information, including samples from target classes, re-trained model, and probabilities assigned by Softmax to each class, and thus making the attack target-agnostic. These assumptions render all previous attack models inapplicable, to the best of our knowledge. To evaluate the proposed attack, we perform a set of experiments on face recognition and speech recognition tasks and show the effectiveness of the attack. Our work reveals a fundamental security weakness of the Softmax layer when used in transfer learning settings.
more » « less
Full Text Available
Deep Learning for Encrypted Traffic Classification: An Overview

Rezaei, Shahbaz; Liu, Xin (January 2019, IEEE communications magazine)

Traffic classification has been studied for two decades and applied to a wide range of applications from QoS provisioning and billing in ISPs to security-related applications in firewalls and intrusion detection systems. Port-based, data packet inspection, and classical machine learning methods have been used extensively in the past, but their accuracy have been declined due to the dramatic changes in the Internet traffic, particularly the increase in encrypted traffic. With the proliferation of deep learning methods, researchers have recently investigated these methods for traffic classification task and reported high accuracy. In this article, we introduce a general framework for deep-learning-based traffic classification. We present commonly used deep learning methods and their application in traffic classification tasks. Then, we discuss open problems, challenges, and opportunities for traffic classification.
more » « less
Full Text Available
Deep Learning for Encrypted Traffic Classification: An Overview

Rezaei, Shahbaz; Liu, Xin (January 2019, IEEE communications magazine)

Traffic classification has been studied for two decades and applied to a wide range of applications from QoS provisioning and billing in ISPs to security-related applications in firewalls and intrusion detection systems. Port-based, data packet inspection, and classical machine learning methods have been used extensively in the past, but their accuracy have been declined due to the dramatic changes in the Internet traffic, particularly the increase in encrypted traffic. With the proliferation of deep learning methods, researchers have recently investigated these methods for traffic classification task and reported high accuracy. In this article, we introduce a general framework for deep-learning-based traffic classification. We present commonly used deep learning methods and their application in traffic classification tasks. Then, we discuss open
more » « less
Full Text Available

Search for: All records